BizTalk 2009 : Understanding the Message Bus

8/11/2011 11:47:32 AM

The Message Bus is the backbone of the BizTalk Server product. The bus contains unique parts, each of which are explained later in the subsection "Messaging Components ." The most obvious of these is the Messagebox, which is explained first. The others include the messages within the Messagebox and the messaging components that move messages to their proper endpoints.

1. The Messagebox

The Messagebox is simply a database. This database has many tables, several of which are responsible for storing the messages that are received by BizTalk. Each message has metadata associated with it called the message context, and the individual metadata items are stored in key/value pairs called context properties. There are context properties that describe all the data necessary to identify elements such as the following:

The inbound port where the message was received from.
The inbound transport type.
Transport-specific information such as ReceivedFileName in the case of the file adapter, InboundQueueName in the case of MSMQ or MQSeries, and so on.
Autogenerated internal MessageID of the message so it can be uniquely identified.
The schema type and namespace of the message, assuming it is an XML message using namespace#root as the message type. A common misconception is that the message type is always composed of the namespace and the root node name. In fact, this is not necessarily so. Richard Seroter, Microsoft MVP, explains this in his blog in greater detail (http://seroter.wordpress.com/2009/02/27/not-using-httpnamespaceroot-asbiztalk-message-type/).

Many people generally equate the Messagebox to be the whole of the BizTalk Server messaging infrastructure. This is absolutely false and is similar to saying that a database is basically a set of data files sitting on a hard drive. The messaging infrastructure, or Message Bus, consists of a dozen or so interrelated components, each of which performs a specific job.

2. Messaging Components

When new architects start designing BizTalk solutions, few stop to think about how the messages are actually going to be sent and received to their proper endpoints. This job belongs to the messaging components within BizTalk, each of which is explained next.

2.1. Host Services

A BizTalk host is nothing more than a logical container. Hosts provide you with the ability to arrange the messaging components of your application into groups that can be distributed across multiple memory processes and across machines. A host is most often used to separate adapters, orchestrations, and ports to run on separate machines to aid in load balancing. A host instance is just that, an instance of the host. The instance is actually just a service called BTSNTSvc.exe that runs on the machine. This process provides the BizTalk engine with a place to execute and allows for instances of different hosts to be running on one machine at a given time. Each host instance will end up being a separate instance of the BTSNTSvc.exe service from within the Windows Task Manager. If you examine the Windows Services control panel applet, you will find that each of the hosts that is configured on the machine will show up as a separate service named whatever the host was originally called. The host instance exists simply to allow the BizTalk subservices a place to run. Most people think of the BizTalk service as a single unit, but really it is a container for multiple services, each of which is described in the following text.

The difference between an Isolated host and an In-Process host is that an Isolated host must run under another process, in most cases IIS, and an In-Process host is a complete Biz-Talk service alone. Additionally, since Isolated hosts exist outside of the BizTalk environment, the BizTalk Administration Tools are not able to determine the status of these hosts (stopped, started, or starting). Security is also fundamentally different in an Isolated host versus an In-Process host. In-Process hosts must run under an account that is within the In-Process host's Windows group, and they do not maintain security context within the Messagebox. For Isolated hosts, you normally create a separate account with minimum permissions since Isolated hosts in most cases receive messages from untrusted sources such as Internet. Isolated hosts are useful when an external process that will be receiving messages either by some proprietary means or by some other transport protocol such as HTTP already exists. IIS is a good example of such a process. In such cases, the Isolated host runs only one instance of the End Point Manager and is responsible for receiving messages from its transport protocol and sending them to the Messagebox through the EPM. Outside of hosting an IIS process, Isolated hosts could be used to attach to a custom Windows service that is polling a message store looking for new items that it will publish to the Messagebox. Isolated processes provide an architectural advantage for these scenarios. They do not require any interprocess communication (IPC) between the EPM and the Windows service that hosts it. The only real IPC that exists between the Isolated host and the Messagebox database is a database service, hosted most likely on another machine.

In-Process hosts can host all BizTalk subservices depending on how they are configured. They not only can receive messages from the outside world, but they can send them through Send Adapters, poll for messages that match a subscription, and host XLANG engine instances. In the case of a Send Adapter, an In-Process host must be used because of how the security context of the Adapter Framework is built. To use adapters with Isolated hosts, the adapters have to use custom IPC. HTTP and SOAP adapters use this technique to interact with aspnet_wp.exe/w3wp.exe processes. Each Isolated host has the set of subservices running within it shown in Table 1 . These services can also be viewed from the adm_HostInstance_SubServices table in the Management Database.

Table 1. Host Instance Subservices
Service	Description
Caching	Service used to cache information that is loaded into the host. Examples of cached information are assemblies that are loaded, adapter configuration information, custom configuration information, and so on.
End Point Manager	Go-between for the Message Agent and the Adapter Framework. The EPM hosts send/receive ports and is responsible for executing pipelines and Biz-Talk transformations.
Tracking	Service that moves information from the Messagebox to the Tracking Database.
XLANG/s	Host engine for BizTalk Server orchestrations.
MSMQT	MSMQT adapter service; serves as a replacement for the MSMQ protocol when interacting with BizTalk Server. The MSMQT protocol has been deprecated in BizTalk Server 2006 and should be used only to resolve backward-compatibility issues.

2.2. Subscriptions

To fully understand the Message Bus architecture, it is critical to understand how subscriptions work and what enlisting is. Subscriptions are the mechanism by which ports and orchestrations are able to receive and send messages within a BizTalk Server solution.

Each BizTalk process that runs on a machine has something called the Message Agent, which is responsible for searching for messages that match subscriptions and routing them to the End Point Manager (EPM), which actually handles the message and sends it where it needs to go. The EPM is the broker between the Messagebox and the pipeline/port/adapter combination. Orchestration subscriptions are handled by a different service called XLANG/s. These services are executed within the BTSNTSvc.exe process that runs on the host

2.2.1. Subscribing

According to Microsoft, "A subscription is a collection of comparison statements, known as predicates, involving message context properties and the values specific to the subscription."^[] Predicates are inserted into one of the Messagebox's predicate tables, based on what type of operation is specified in the subscription being created. Note the list of predicate tables that follows; these are the same predicates that are used in the filter editor for defining filter criteria on ports. The reason the list of tables is the same as the list of filter predicates is because a filter expression is actually being used to build each subscription. When you are defining a filter expression, what you are actually doing is modifying the underlying subscription within BizTalk to contain the new filter information that is included in your filter expression.

^[] Microsoft MSDN: http://msdn.microsoft.com/en-us/library/ms935116.aspx

BitwiseANDPredicates
EqualsPredicates
EqualsPredicates2ndPass
ExistsPredicates
FirstPassPredicates
GreaterThanOrEqualsPredicates
GreaterThanPredicates
LessThanOrEqualsPredicates
LessThanPredicates
NotEqualsPredicates

The BizTalk services create a subscription in the Messagebox by calling two stored procedures. These are bts_CreateSubscription_{HostName} and bts_InsertPredicate_{HostName}. The subscription is created based on which host will be handling the subscription, which is why these stored procedures are created automatically when the host is created in the Microsoft Management Console.

2.2.2. Enlisting

Most people ask what the difference is between enlisting a port and starting a port. The difference is simple. Enlisted ports have subscriptions written for them in the Messagebox, while unenlisted ports do not. The same is true for orchestrations. Artifacts that are not enlisted are simply in "deployment limbo" in that they are ready to process messages but no way exists for the Messaging Engine to send them one. The main effect this will have is that ports and orchestrations that are enlisted, but not started, will have any messages with matching subscription information queued within the Messagebox and ready to be processed once the artifact is started. If the port or orchestration is not enlisted, the message routing will fail, since no subscription is available and the message will produce a "No matching subscriptions were found for the incoming message" exception within the Event Log. You have to be aware of a common and potentially risky situation when you have more than one subscriber for a particular message type. In such cases, if the published message routed to at least one of the subscribers, unenlisted offenders would never get the message, and moreover no error would be raised since the message satisfied another subscriber.

When a port is enlisted, the Message Agent will create subscriptions for any message whose context property for TransportID matches the port's transport ID. For orchestrations, it also creates the subscription based on the MessageType of the message that is being sent to the port within the orchestration. Binding an orchestration port to a physical send port will force the EPM to write information about that binding to the Management Database. Should the orchestration send messages through its logical port to the physical port, it will include the transport ID in the context so that the message is routed to that specific send port.

The next point is related to the pub/sub nature of the Message Bus. Since any endpoint with a matching subscription can process the message once it is sent from an orchestration to the send port, it is possible for multiple endpoints to act upon that message. This is critical to understand. Sending a message through an orchestration port to a bound physical port simply guarantees that a subscription will be created so that the message is routed to that particular endpoint. There is nothing that says no other subscriber may also act on that message. Most developers often overlook this point. Most people assume that since the port is bound, it simply ends up at the correct send port by magic. In reality, all that is happening is that the Message Agent is writing a subscription that hard-codes the context properties of that message so that it will always end up at least at that particular send port. Sending the message through the send port simply publishes the message in the Messagebox, and the engine and subscriptions take care of the rest so that you won't have to publish a message over and over again in order to reach multiple targets.

2.3. Messages

A message within BizTalk is more than just a direct representation of the document received from the outer world. BizTalk has a model where messages contain both data and context. Understanding how messages are stored internally within the Messagebox is crucial to understanding how to architect systems that take advantage of how the product represents messages internally.

2.3.1. What Is a Message?

A message is a finite entity within the BizTalk Messagebox. Messages have context properties and zero-to-many message parts. Subscriptions match particular context properties for a message and determine which endpoints are interested in processing it. As mentioned before, there is one critical rule that will never change:

Messages are immutable once they are published.

Many people who have worked with BizTalk for years do not fully understand this rule. A message cannot be changed once it has reached the Messagebox. At this point most developers would say rather proudly, "But what about a pipeline component? I can write a pipeline component that modifies the message and its payload along with the context properties, right?" The answer to this question is already in the request. Modifying the message can be done only in a pipeline, either sending or receiving. A receive pipeline modifies the message before it gets to the Messagebox. At the end of the pipeline, the message is published. A send pipeline operates on the message after it leaves the Messagebox and before it is sent out. The original message is still unmodified in the Messagebox database regardless of what the send pipeline decides to do with the message.

2.3.2. Messages vs. Message Parts

Messages are composed of zero or more message parts. All messages with parts generally have a part that is marked as the body part. The body part of the message is considered to contain the data or "meat" of the message. Many adapters will examine only the body part of the message and ignore any other parts in case of multipart messages. These are the messages containing more than one document. A multipart message can have one "body" and any number of additional parts. The closest analogy is an email message with attachments. If you look at the Messagebox database, there are two specific tables, one that holds all messages that flow through BizTalk and one that holds all the message parts. This zero-to-many relationship implies something—message parts can be reused in multiple messages. And that is absolutely true. Each message part has a unique part ID that is stored in the MessageParts table and is associated with the message ID of the main message. It is also important to understand that message parts contain message bodies, which are generally XML based. If a message is received on a port that uses a pass-through pipeline, then the message can be anything including binary data. When using a pass-through pipeline, a Receive Adapter stamps its values into message context, but no properties can be promoted from the data of the message. If you think about it, this is obvious. In the case where you are accepting binary data, BizTalk has no mechanism to examine the message body part and determine the message type, so how can it promote it? In this case, the message will contain one message part whose message body is a stream of binary data.

2.4. Message Context Properties

Message context properties are defined in what is called a property schema. The properties themselves are then stored into context property bags. The context property bags are simply containers for the properties, which are stored as key/value pairs.

2.4.1. Context Property Schemas

The schema of the inbound message is used by BizTalk Server to associate it with any corresponding property schemas. There is a global properties schema every message can use by default that contains system-level properties. It is possible to create custom properties schemas that can define application-specific and typically content-based properties that may be required such as an internal organizational key, the customer who submitted the document, and so on.

System-level properties defined within global property schemas are essentially the same as custom context properties defined within a custom property schema. Both types have a root namespace that is used to identify the type of property, and both are stored within the context property bag for a given message. In reality there is no real difference to the runtime in terms of whether a context property is a "system-level" property or a "custom" property.

Context properties, whether they are system or custom properties, define part of the subscription that is used to evaluate which endpoint(s) have a valid subscription to the message. The most common message subscription is based on the message type. BizTalk typically identifies the message type in the message context as a combination of the XML namespace of the message along with the root node name plus the #. For example, say that you had a document with the declaration in Listing 1.

Example 1. XML Order Request Sample Document

<ns0:Request xmlns:ns0="http://schemas.abccompany.com">
<Header>
            <ReqID>4</ReqID>
            <Date>6/6/2005</Date>
            </Header>

<Item>
                    <Description>Description_0</Description>
                    <Quantity>10</Quantity>
                    <UnitPrice>2</UnitPrice>
                    <TotalPrice>2</TotalPrice>
       </Item>
</ns0:Request>

The BizTalk message type in this example would be http://schemas.abccompany. com#Request. For message type-based subscriptions, the subscription would then be evaluated by the Message Agent to determine whether any endpoints have subscriptions for the message in question. The list of all subscriptions can be viewed within the BizTalk MMC snap-in tool by viewing all the subscriptions within the solution. Figure 1 shows that each of the message properties can be viewed within the BizTalk Administration Console and selected in the message properties drop-down list, which can be used to search for messages within the tool.

NOTE

The message context properties will only be available if the XML or flat-file pipelines were used. If the pass-through pipeline processed the message, no properties would be available for searching in the BizTalk Administration Console.

Figure 1. BizTalk Administration Console

Using subscriptions to route documents to the proper endpoints is called content-based routing (CBR). Having a thorough understanding of the pub/sub nature of the BizTalk Message Bus is crucial when designing any large messaging-based application, especially in situations where there is going to be significant amounts of routing between organizations and trading partners.

2.4.2. The Context Property Bag

As stated previously, context properties are simply key/value pairs stored in an object that implements the IBasePropertyBag interface. As you can see in the following code and in Table 2 , the definition of the interface is quite simple:

<Guid("fff93009-75a2-450a-8a39-53120ca8d8fa")>
<InterfaceType(ComInterfaceType.InterfaceIsIUnknown)>
Public Interface IBasePropertyBag

Table 2. IBasePropertyBag Interface Definition
Public Properties	--
CountProperties	Gets the number of properties in the property bag
Public Methods	--
Read	Reads the value and type of the given property in the property bag
ReadAt	Reads the property at the specified index value in the property bag
Write	Adds or overwrites a property in the property bag

Given that the context property bag is such a simple structure, it is possible to use the BizTalk API to write any property you want into the property bag. Note that this does not require the property be promoted. Writing a property into the property bag does not mean it is promoted and available for message routing. If a value needs to be visible to the Message Bus for routing purposes, it has to be promoted. By using the property schema to promote a property, either by using a custom property schema or by promoting a value into a property defined in the global property schemas, what you are doing is first writing the value into the property bag and then marking it as promoted. When adding context values from within a pipeline component, you should be aware that there are different API calls for simply writing properties and actually promoting them.

It is critical to understand that everything that is written to the property bag is visible within the MMC. Likewise, it is quite easy to view the subscription information for any ports that route on context properties. If you are promoting properties into the message context, make sure that they do not contain any sensitive data. For example, if you have a field in a schema that contains credit card numbers, do not promote this value without taking precautions. If you do store the credit card information in a schema, make sure to make it sensitive within the schema definition. This will cause the BizTalk runtime to throw an error should that element's value be promoted. If it is absolutely necessary to promote this value, make sure you encrypt it using a third-party tool.